Assignment 4: Chicago Crimes¶
MSDS420¶
Author: Atef Bader, PhD
Edit: Code refactored by EA with additional changes by EA & SD
Edit: v2.0 Eliminated intro text. Question changes by EA and SD (March 2022)
Edit: v2.1 Added pip install instructions for Mac SD (April 2022)
Edit: v2023 Made 75 points SD (March 2023)
Deliverables:¶
- Submit two files that are labeled: YourLastName_Assignment_4 that have the following formats:
- Your HTML that has your Source code and output
- Your ipynb script that has your Source code and output
Objectives:¶
- Use SQL to execute different queries to retrieve data from Chicago Crime dataset and Police statins dataset
- Use Geospatial queries to locate police stations and gun related crimes (with arrest or no arrest) in every district on Choropleth map
- Use Geospatial queries to provide descriptive stat for every district on Choropleth map
- Use Geospatial queries to locate the Block that is the furthest (Maximum Distance) from the police station that has gun related crime resulted in arrest
Formatting Python Code:¶
When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code: http://pep8.org/ (Links to an external site.)Links to an external site. There is the Google style guide for Python at https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site. Comment often and in detail.
Descriptions and Requirement Specifications¶
The CPD categoizes the crimes into 8 categories as follows:¶
Chicago Crimes Dataset¶
The CSV file for crimes dataset for the city of Chicago is obtained from the data portal for the city of Chicago. Here is the link for the city of Chicago data portal City of Chicago Data Portal
Loading the Dataset CSV file¶
Three set of data are need for this assignment:
- The Chicago police stations in every district (hosted on NW server)
- The Crimes dataset (hosted on NW server)
- The Boundaries.geojson data for district boundries (downloaded in Zip file)
Complete description of the dataset can be found on Chicago city data portal.
Based on Trumps State of the Uniion Address and the article written by columnist Clarence Page and published by the Chicago Tribune, we are interested to retrieve the data for the past two years and perform different types of spatial queries.
There are few of these queries that we are interested in to help CPD and city of Chicago to plot on a Choroplteh map those districts that have highest gun crimes.
Here are examples of those types of queries:
- Plot on Choropleth map the districts and their Violent Crimes
- Plot on Choropleth map the districts and their Gun related crimes
- Which district is the crime capital of Chicago districts?
- What the crime density per district?
- Plot on Choropleth map those gun related crimes that resulted in arrests
- Plot on Choropleth map the gun related crime that is in the farthest Block from the policy stattion for every district
Packages you need to Connect PostgreSQL server to load and retrieve Crhicago Crime dataset from the database:
- psycopg2: for PostgreSQL driver
- area: to calculate the area inside of any GeoJSON geometry
- Folium: for Choropleth maps
Since we are using PostGIS in our work, please read and bookmark Chapter 4. Using PostGIS: Data Management and Queries
# Install a pip package in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install psycopg2
# if you are on a Mac and the above fails, use code below
#!{sys.executable} -m pip install psycopg2-binary
# Install a pip package in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install area
# Install a pip package in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install folium
import folium
from folium import plugins
from folium.plugins import MarkerCluster
import psycopg2
import csv
import pandas as pd
import numpy as np
import json
from area import area
import seaborn as sns
import matplotlib.pyplot as plt
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
from IPython.display import display
# set up notebook to display multiple output in one cell
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "all"
folium.__version__
'0.15.1'
psycopg2.__version__
'2.9.9 (dt dec pq3 ext lo64)'
IMPORTANT NOTES:¶
- In this assignment you will be connecting to DSCC PostgreSQL server that has the database and tables already created on the server.
- You need to use your NetID and password for login and passward to connect PostgreSQL server hosted on DSCC
- For the psycopg2.connect statements listed below, you must provide your NetID and password in order to connect to PostgreSQL server hosted on DSCC
# Use this data set for your final submission of your Assignment 3
# Uncomment the following line after you unit test your code and ready to run and submit your assignment on this dataset
# db_connection = psycopg2.connect(host='129.105.248.26',dbname="chicago_crimes", user="YourNetID")
# =======================================================================================================
# Use the following dataset for unit testing purposes only. It takes roughly 5 minutes to complete.
# Comment the following line when you are done with your unit testing and ready to run your assignment on the complete dataset and submit your Assignment
db_connection = psycopg2.connect(host = '129.105.248.26',dbname = "chicago_crimes_ut", user = "fwk7866")
db_connection.set_session(autocommit = True)
cursor = db_connection.cursor()
# Uncomment and run this code when your transactions get "stuck"
#db_connection.rollback()
Chicago Crimes Dataset¶
The Crimes_2001_to_present.csv is downloaded from Chicago data portal and it has roughly 6.5 million records.
While working on this dataset, It is prudent to make a note of the following:
- Geospatial queries are very demanding for system resouces like CPU, Memory, and DISK
- We are interested in the data set of the past 2 years, and when you execute Geospatial type queries, please be advised that these queries slow down your machine.
- There are two dataset that you will use in this assignment
- chicago_crimes_ut : Use this mini dataset that has data for rougly a month for coding/debugging and unit testing
- chicago_crimes : Use this dataset that has data for 2 years for your final submission
Algorithm Performance¶
Sort algorithms used by the database engines vary in performance between O($N log N$) and O($ N^{2} $) where $N$ is the size of the number
Search algorithms used by the database engines vary in performance between O($log N$) and O($ N $) where $N$ is the size of the number
What are the column names in the 2 tables that we will use from chicago_crimes database?¶
# Get the column names for table crimes
cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'crimes';")
rows=cursor.fetchall()
rows
[('id',),
('caseno',),
('date_of_occurrence',),
('block',),
('iucr',),
('primary_type',),
('description',),
('location_description',),
('arrest',),
('domestic',),
('beat',),
('district',),
('ward',),
('community_area',),
('fbi_cd',),
('xcoordinate',),
('ycoordinate',),
('year',),
('updated_on',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
# Get the column names for table police_stations
cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'police_stations';")
rows=cursor.fetchall()
rows
[('district',),
('district_name',),
('address',),
('city',),
('state',),
('zip',),
('website',),
('phone',),
('fax',),
('tty',),
('x_coordinate',),
('y_coordinate',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
# Get the first five rows of the police station table
query = "SELECT * \
FROM police_stations"
stations = pd.read_sql_query(query, db_connection)
stations.head()
C:\Users\Ritesh Kumar\AppData\Local\Temp\ipykernel_2512\1681415791.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. stations = pd.read_sql_query(query, db_connection)
| district | district_name | address | city | state | zip | website | phone | fax | tty | x_coordinate | y_coordinate | latitude | longitude | location | where_is | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Central | 1718 S State St | Chicago | IL | 60616 | http://home.chicagopolice.org/community/distri... | 312-745-4290 | 312-745-3694 | 312-745-3693 | 1176569.052 | 1891771.704 | 41.858373 | -87.627356 | (41.8583725929, -87.627356171) | 0101000020E61000009AEA2C27DFED444001467E9A26E8... |
| 1 | 2 | Wentworth | 5101 S Wentworth Ave | Chicago | IL | 60609 | http://home.chicagopolice.org/community/distri... | 312-747-8366 | 312-747-5396 | 312-747-6656 | 1175864.837 | 1871153.753 | 41.801811 | -87.630560 | (41.8018110912, -87.6305601801) | 0101000020E6100000608FECBEA1E64440A2D015195BE8... |
| 2 | 3 | Grand Crossing | 7040 S Cottage Grove Ave | Chicago | IL | 60637 | http://home.chicagopolice.org/community/distri... | 312-747-8201 | 312-747-5479 | 312-747-9168 | 1182739.183 | 1858317.732 | 41.766431 | -87.605748 | (41.7664308925, -87.6057478606) | 0101000020E6100000DB984B681AE244409B14AC92C4E6... |
| 3 | 4 | South Chicago | 2255 E 103rd St | Chicago | IL | 60617 | http://home.chicagopolice.org/community/distri... | 312-747-7581 | 312-747-5276 | 312-747-9169 | 1193131.299 | 1837090.265 | 41.707933 | -87.568349 | (41.7079332906, -87.5683491228) | 0101000020E61000005C26DC8E9DDA444010C7FCD45FE4... |
| 4 | 5 | Calumet | 727 E 111th St | Chicago | IL | 60628 | http://home.chicagopolice.org/community/distri... | 312-747-8210 | 312-747-5935 | 312-747-9170 | 1183305.427 | 1831462.313 | 41.692723 | -87.604506 | (41.6927233639, -87.6045058667) | 0101000020E61000000830B828ABD84440F6786339B0E6... |
# Another way to get the police station table
query = "SELECT * \
FROM police_stations"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
stations = pd.DataFrame(data,columns = colnames).drop_duplicates()
stations
| district | district_name | address | city | state | zip | website | phone | fax | tty | x_coordinate | y_coordinate | latitude | longitude | location | where_is | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Central | 1718 S State St | Chicago | IL | 60616 | http://home.chicagopolice.org/community/distri... | 312-745-4290 | 312-745-3694 | 312-745-3693 | 1176569.052 | 1891771.704 | 41.858373 | -87.627356 | (41.8583725929, -87.627356171) | 0101000020E61000009AEA2C27DFED444001467E9A26E8... |
| 1 | 2 | Wentworth | 5101 S Wentworth Ave | Chicago | IL | 60609 | http://home.chicagopolice.org/community/distri... | 312-747-8366 | 312-747-5396 | 312-747-6656 | 1175864.837 | 1871153.753 | 41.801811 | -87.630560 | (41.8018110912, -87.6305601801) | 0101000020E6100000608FECBEA1E64440A2D015195BE8... |
| 2 | 3 | Grand Crossing | 7040 S Cottage Grove Ave | Chicago | IL | 60637 | http://home.chicagopolice.org/community/distri... | 312-747-8201 | 312-747-5479 | 312-747-9168 | 1182739.183 | 1858317.732 | 41.766431 | -87.605748 | (41.7664308925, -87.6057478606) | 0101000020E6100000DB984B681AE244409B14AC92C4E6... |
| 3 | 4 | South Chicago | 2255 E 103rd St | Chicago | IL | 60617 | http://home.chicagopolice.org/community/distri... | 312-747-7581 | 312-747-5276 | 312-747-9169 | 1193131.299 | 1837090.265 | 41.707933 | -87.568349 | (41.7079332906, -87.5683491228) | 0101000020E61000005C26DC8E9DDA444010C7FCD45FE4... |
| 4 | 5 | Calumet | 727 E 111th St | Chicago | IL | 60628 | http://home.chicagopolice.org/community/distri... | 312-747-8210 | 312-747-5935 | 312-747-9170 | 1183305.427 | 1831462.313 | 41.692723 | -87.604506 | (41.6927233639, -87.6045058667) | 0101000020E61000000830B828ABD84440F6786339B0E6... |
| 5 | 6 | Gresham | 7808 S Halsted St | Chicago | IL | 60620 | http://home.chicagopolice.org/community/distri... | 312-745-3617 | 312-745-3649 | 312-745-3639 | 1172283.013 | 1853022.646 | 41.752137 | -87.644229 | (41.7521368378, -87.6442289066) | 0101000020E610000059F51C0546E044409DE5E40B3BE9... |
| 6 | 7 | Englewood | 1438 W 63rd St | Chicago | IL | 60636 | http://home.chicagopolice.org/community/distri... | 312-747-8223 | 312-747-6558 | 312-747-6652 | 1167659.235 | 1863005.522 | 41.779632 | -87.660887 | (41.7796315359, -87.6608870173) | 0101000020E6100000749D5FF7CAE344400F5012F94BEA... |
| 7 | 8 | Chicago Lawn | 3420 W 63rd St | Chicago | IL | 60629 | http://home.chicagopolice.org/community/distri... | 312-747-8730 | 312-747-8545 | 312-747-8116 | 1154575.242 | 1862672.049 | 41.778987 | -87.708864 | (41.778987189, -87.7088638153) | 0101000020E6100000C7862CDAB5E34440EE0D5B065EED... |
| 8 | 9 | Deering | 3120 S Halsted St | Chicago | IL | 60608 | http://home.chicagopolice.org/community/distri... | 312-747-8227 | 312-747-5329 | 312-747-9172 | 1171440.240 | 1884085.224 | 41.837394 | -87.646408 | (41.8373944311, -87.6464077068) | 0101000020E6100000EE599DBD2FEB4440449571BE5EE9... |
| 9 | 10 | Ogden | 3315 W Ogden Ave | Chicago | IL | 60623 | http://home.chicagopolice.org/community/distri... | 312-747-7511 | 312-747-7429 | 312-747-7471 | 1154500.753 | 1890985.501 | 41.856685 | -87.708382 | (41.8566845327, -87.708381958) | 0101000020E610000069ABB3D6A7ED4440B0D1492156ED... |
| 10 | 11 | Harrison | 3151 W Harrison St | Chicago | IL | 60612 | http://home.chicagopolice.org/community/distri... | 312-746-8386 | 312-746-4281 | 312-746-5151 | 1155244.069 | 1897148.755 | 41.873582 | -87.705488 | (41.8735822883, -87.705488126) | 0101000020E610000018F5628BD1EF44403D84AFB726ED... |
| 11 | 12 | Near West | 1412 S Blue Island Ave\n | Chicago | IL | 60608 | http://home.chicagopolice.org/community/distri... | 312-746-8396 | 312-746-4248 | 312-746-9868 | 1168487.845 | 1893384.455 | 41.862977 | -87.656973 | (41.8629766244, -87.6569725149) | 0101000020E61000006A0E940476EE44408C346DD60BEA... |
| 12 | 14 | Shakespeare | 2150 N California Ave | Chicago | IL | 60647 | http://home.chicagopolice.org/community/distri... | 312-744-8250 | 312-744-2422 | 312-744-8260 | 1157304.426 | 1914481.521 | 41.921103 | -87.697452 | (41.9211033246, -87.6974518223) | 0101000020E61000008AD1ADB6E6F54440E25BF50CA3EC... |
| 13 | 15 | Austin | 5701 W Madison St | Chicago | IL | 60644 | http://home.chicagopolice.org/community/distri... | 312-743-1440 | 312-743-1366 | 312-743-1485 | 1138148.815 | 1899399.078 | 41.880083 | -87.768200 | (41.8800834614, -87.768199889) | 0101000020E6100000AB392793A6F04440D015DF2F2AF1... |
| 14 | 16 | Jefferson Park | 5151 N Milwaukee Ave | Chicago | IL | 60630 | http://home.chicagopolice.org/community/distri... | 312-742-4480 | 312-742-4421 | 312-742-4423 | 1138480.758 | 1933660.473 | 41.974094 | -87.766149 | (41.9740944511, -87.7661488432) | 0101000020E610000089FB7E20AFFC4440C6EA249508F1... |
| 15 | 17 | Albany Park | 4650 N Pulaski Rd | Chicago | IL | 60630 | http://home.chicagopolice.org/community/distri... | 312-742-4410 | 312-742-5411 | 312-742-5451 | 1148843.910 | 1930801.058 | 41.966053 | -87.728115 | (41.9660534171, -87.728114561) | 0101000020E6100000638B72A3A7FB444040BCCF6D99EE... |
| 16 | 18 | Near North | 1160 N Larrabee St | Chicago | IL | 60610 | http://home.chicagopolice.org/community/distri... | 312-742-5870 | 312-742-5771 | 312-742-5773 | 1172080.029 | 1908086.527 | 41.903242 | -87.643352 | (41.9032416531, -87.6433521393) | 0101000020E61000004A91216C9DF34440254774AE2CE9... |
| 17 | 19 | Town Hall | 850 W Addison St | Chicago | IL | 60613 | http://home.chicagopolice.org/community/distri... | 312-744-8320 | 312-744-4481 | 312-744-8011 | 1169730.744 | 1924160.317 | 41.947400 | -87.651512 | (41.9474004564, -87.651512018) | 0101000020E61000002AF5136B44F9444075B6785FB2E9... |
| 18 | 20 | Lincoln | 5400 N Lincoln Ave | Chicago | IL | 60625 | http://home.chicagopolice.org/community/distri... | 312-742-8714 | 312-742-8803 | 312-742-8841 | 1158399.146 | 1935788.826 | 41.979550 | -87.692845 | (41.9795495131, -87.6928445094) | 0101000020E6100000A421DBE061FD444071EA7F9057EC... |
| 19 | 22 | Morgan Park | 1900 W Monterey Ave | Chicago | IL | 60643 | http://home.chicagopolice.org/community/distri... | 312-745-0710 | 312-745-0814 | 312-745-0569 | 1165825.476 | 1830851.333 | 41.691435 | -87.668520 | (41.6914347795, -87.6685203937) | 0101000020E61000005FB553EF80D8444096F0BE09C9EA... |
| 20 | 24 | Rogers Park | 6464 N Clark St | Chicago | IL | 60626 | http://home.chicagopolice.org/community/distri... | 312-744-5907 | 312-744-6928 | 312-744-7603 | 1164193.588 | 1943199.401 | 41.999763 | -87.671324 | (41.9997634842, -87.6713242922) | 0101000020E6100000E62AED3FF8FF4440DEA327FAF6EA... |
| 21 | 25 | Grand Central | 5555 W Grand Ave | Chicago | IL | 60639 | http://home.chicagopolice.org/community/distri... | 312-746-8605 | 312-746-4353 | 312-746-8383 | 1138770.871 | 1913442.439 | 41.918609 | -87.765574 | (41.9186088912, -87.765574479) | 0101000020E6100000B42EE2F994F5444024901A2CFFF0... |
| 22 | Headquarters | Headquarters | 3510 S Michigan Ave | Chicago | IL | 60653 | http://home.chicagopolice.org | 1177731.401 | 1881697.404 | 41.830702 | -87.623395 | (41.8307016873, -87.6233953459) | 0101000020E610000016A4D76E54EA44409F2E9CB5E5E7... |
# store districts (excluding Headquarters) for later use
districts = stations.query("district != 'Headquarters'").district
districts
0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 14 13 15 14 16 15 17 16 18 17 19 18 20 19 22 20 24 21 25 Name: district, dtype: object
# Display the first 5 rows in the crimes table
query = "SELECT * \
FROM crimes"
chicago = pd.read_sql_query(query, db_connection)
chicago.head()
C:\Users\Ritesh Kumar\AppData\Local\Temp\ipykernel_2512\3352474281.py:5: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. chicago = pd.read_sql_query(query, db_connection)
| id | caseno | date_of_occurrence | block | iucr | primary_type | description | location_description | arrest | domestic | ... | community_area | fbi_cd | xcoordinate | ycoordinate | year | updated_on | latitude | longitude | location | where_is | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 24368 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 1 | 24369 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 2 | 24370 | JC105864 | 2019-01-05 16:23:00 | 012XX N ADA ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 24 | 01A | 1167119.0 | 1908611.0 | 2019 | 2019-01-18 09:37:00 | 41.904789 | -87.661560 | (41.904788891, -87.66155984) | 0101000020E6100000C52A521FD0F34440684915FF56EA... |
| 3 | 24372 | JC107385 | 2019-01-06 22:23:00 | 060XX W 63RD ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 64 | 01A | 1137054.0 | 1862130.0 | 2019 | 2019-01-18 09:37:00 | 41.777831 | -87.773112 | (41.77783091, -87.773112387) | 0101000020E61000004B2298F68FE34440ACCB63AC7AF1... |
| 4 | 24373 | JC108358 | 2019-01-07 15:45:00 | 002XX W 87TH ST | 110 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | False | ... | 44 | 01A | 1176442.0 | 1847223.0 | 2019 | 2019-01-18 09:37:00 | 41.736129 | -87.629162 | (41.736129473, -87.629162044) | 0101000020E6100000CBA28F7D39DE4440D16BDC3044E8... |
5 rows × 23 columns
# Another way to get the crimes table
query = "SELECT * \
FROM crimes"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
chicago = pd.DataFrame(data,columns = colnames)
chicago.columns
Index(['id', 'caseno', 'date_of_occurrence', 'block', 'iucr', 'primary_type',
'description', 'location_description', 'arrest', 'domestic', 'beat',
'district', 'ward', 'community_area', 'fbi_cd', 'xcoordinate',
'ycoordinate', 'year', 'updated_on', 'latitude', 'longitude',
'location', 'where_is'],
dtype='object')
chicago.head()
| id | caseno | date_of_occurrence | block | iucr | primary_type | description | location_description | arrest | domestic | ... | community_area | fbi_cd | xcoordinate | ycoordinate | year | updated_on | latitude | longitude | location | where_is | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 24368 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 1 | 24369 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 2 | 24370 | JC105864 | 2019-01-05 16:23:00 | 012XX N ADA ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 24 | 01A | 1167119.0 | 1908611.0 | 2019 | 2019-01-18 09:37:00 | 41.904789 | -87.661560 | (41.904788891, -87.66155984) | 0101000020E6100000C52A521FD0F34440684915FF56EA... |
| 3 | 24372 | JC107385 | 2019-01-06 22:23:00 | 060XX W 63RD ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 64 | 01A | 1137054.0 | 1862130.0 | 2019 | 2019-01-18 09:37:00 | 41.777831 | -87.773112 | (41.77783091, -87.773112387) | 0101000020E61000004B2298F68FE34440ACCB63AC7AF1... |
| 4 | 24373 | JC108358 | 2019-01-07 15:45:00 | 002XX W 87TH ST | 110 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | False | ... | 44 | 01A | 1176442.0 | 1847223.0 | 2019 | 2019-01-18 09:37:00 | 41.736129 | -87.629162 | (41.736129473, -87.629162044) | 0101000020E6100000CBA28F7D39DE4440D16BDC3044E8... |
5 rows × 23 columns
DATA DICTIONARY (CRIMES) (See https://www.kaggle.com/currie32/crimes-in-chicago)
id - Unique identifier for the record.
caseno - The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.
date_of_occurence - Date when the incident occurred. this is sometimes a best estimate.
**block - The partially redacted address where the incident occurred, placing it on the same block as the actual address.
iucr - The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.
primary_type - The primary description of the IUCR code.
description - The secondary description of the IUCR code, a subcategory of the primary description.
location-description - Description of the location where the incident occurred.
arrest - Indicates whether an arrest was made.
domestic - Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
beat - Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.
district - Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.
ward - The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.
community_area - Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.
fbi_cd - Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). See the Chicago Police Department listing of these classifications at http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.
xcoordinate - The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
ycoordinate - The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
year - Year the incident occurred.
updated_on - Date and time the record was last updated.
latitude - The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
longitude - The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
location - The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.
chicago.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14436 entries, 0 to 14435 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 14436 non-null int64 1 caseno 14436 non-null object 2 date_of_occurrence 14436 non-null datetime64[ns] 3 block 14436 non-null object 4 iucr 14436 non-null object 5 primary_type 14436 non-null object 6 description 14436 non-null object 7 location_description 14436 non-null object 8 arrest 14436 non-null bool 9 domestic 14436 non-null bool 10 beat 14436 non-null int64 11 district 14436 non-null int64 12 ward 14436 non-null float64 13 community_area 14436 non-null object 14 fbi_cd 14436 non-null object 15 xcoordinate 14436 non-null float64 16 ycoordinate 14436 non-null float64 17 year 14436 non-null int64 18 updated_on 14436 non-null datetime64[ns] 19 latitude 14436 non-null float64 20 longitude 14436 non-null float64 21 location 14436 non-null object 22 where_is 14436 non-null object dtypes: bool(2), datetime64[ns](2), float64(5), int64(4), object(10) memory usage: 2.3+ MB
chicago.head()
| id | caseno | date_of_occurrence | block | iucr | primary_type | description | location_description | arrest | domestic | ... | community_area | fbi_cd | xcoordinate | ycoordinate | year | updated_on | latitude | longitude | location | where_is | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 24368 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 1 | 24369 | JC103046 | 2019-01-03 14:37:00 | 050XX S INDIANA AVE | 110 | HOMICIDE | FIRST DEGREE MURDER | AUTO | False | False | ... | 38 | 01A | 1178455.0 | 1871575.0 | 2019 | 2019-01-10 15:16:00 | 41.802909 | -87.621048 | (41.802908524, -87.62104831) | 0101000020E6100000368AD5B4C5E64440E92B6941BFE7... |
| 2 | 24370 | JC105864 | 2019-01-05 16:23:00 | 012XX N ADA ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 24 | 01A | 1167119.0 | 1908611.0 | 2019 | 2019-01-18 09:37:00 | 41.904789 | -87.661560 | (41.904788891, -87.66155984) | 0101000020E6100000C52A521FD0F34440684915FF56EA... |
| 3 | 24372 | JC107385 | 2019-01-06 22:23:00 | 060XX W 63RD ST | 110 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | False | ... | 64 | 01A | 1137054.0 | 1862130.0 | 2019 | 2019-01-18 09:37:00 | 41.777831 | -87.773112 | (41.77783091, -87.773112387) | 0101000020E61000004B2298F68FE34440ACCB63AC7AF1... |
| 4 | 24373 | JC108358 | 2019-01-07 15:45:00 | 002XX W 87TH ST | 110 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | False | ... | 44 | 01A | 1176442.0 | 1847223.0 | 2019 | 2019-01-18 09:37:00 | 41.736129 | -87.629162 | (41.736129473, -87.629162044) | 0101000020E6100000CBA28F7D39DE4440D16BDC3044E8... |
5 rows × 23 columns
# use value_counts to see the values of the field
chicago['year'].value_counts()
year 2019 14436 Name: count, dtype: int64
# what is in the field domestic
chicago['domestic'].value_counts()
domestic False 11974 True 2462 Name: count, dtype: int64
# let's look at how many crimes are domestic by primary type
pd.crosstab(chicago['primary_type'], chicago['domestic'], margins=True)
| domestic | False | True | All |
|---|---|---|---|
| primary_type | |||
| ARSON | 18 | 1 | 19 |
| ASSAULT | 774 | 317 | 1091 |
| BATTERY | 1328 | 1398 | 2726 |
| BURGLARY | 630 | 7 | 637 |
| CONCEALED CARRY LICENSE VIOLATION | 7 | 0 | 7 |
| CRIM SEXUAL ASSAULT | 79 | 15 | 94 |
| CRIMINAL DAMAGE | 1289 | 168 | 1457 |
| CRIMINAL TRESPASS | 436 | 25 | 461 |
| DECEPTIVE PRACTICE | 785 | 5 | 790 |
| GAMBLING | 3 | 0 | 3 |
| HOMICIDE | 15 | 0 | 15 |
| HUMAN TRAFFICKING | 3 | 0 | 3 |
| INTERFERENCE WITH PUBLIC OFFICER | 81 | 0 | 81 |
| INTIMIDATION | 12 | 4 | 16 |
| KIDNAPPING | 4 | 0 | 4 |
| LIQUOR LAW VIOLATION | 11 | 0 | 11 |
| MOTOR VEHICLE THEFT | 483 | 5 | 488 |
| NARCOTICS | 933 | 0 | 933 |
| NON-CRIMINAL | 1 | 0 | 1 |
| OBSCENITY | 0 | 1 | 1 |
| OFFENSE INVOLVING CHILDREN | 50 | 49 | 99 |
| OTHER OFFENSE | 678 | 326 | 1004 |
| PROSTITUTION | 58 | 0 | 58 |
| PUBLIC PEACE VIOLATION | 63 | 1 | 64 |
| ROBBERY | 513 | 11 | 524 |
| SEX OFFENSE | 41 | 3 | 44 |
| STALKING | 7 | 2 | 9 |
| THEFT | 3300 | 124 | 3424 |
| WEAPONS VIOLATION | 372 | 0 | 372 |
| All | 11974 | 2462 | 14436 |
# percentages would be more informative
# ------ lambda expressions section 4.7.6 ('syntactic sugar for a normal function definition')
# https://docs.python.org/3/tutorial/controlflow.html
pd.crosstab(chicago['primary_type'], chicago['domestic']).apply(lambda x: x/x.sum(), axis = 1)
| domestic | False | True |
|---|---|---|
| primary_type | ||
| ARSON | 0.947368 | 0.052632 |
| ASSAULT | 0.709441 | 0.290559 |
| BATTERY | 0.487161 | 0.512839 |
| BURGLARY | 0.989011 | 0.010989 |
| CONCEALED CARRY LICENSE VIOLATION | 1.000000 | 0.000000 |
| CRIM SEXUAL ASSAULT | 0.840426 | 0.159574 |
| CRIMINAL DAMAGE | 0.884695 | 0.115305 |
| CRIMINAL TRESPASS | 0.945770 | 0.054230 |
| DECEPTIVE PRACTICE | 0.993671 | 0.006329 |
| GAMBLING | 1.000000 | 0.000000 |
| HOMICIDE | 1.000000 | 0.000000 |
| HUMAN TRAFFICKING | 1.000000 | 0.000000 |
| INTERFERENCE WITH PUBLIC OFFICER | 1.000000 | 0.000000 |
| INTIMIDATION | 0.750000 | 0.250000 |
| KIDNAPPING | 1.000000 | 0.000000 |
| LIQUOR LAW VIOLATION | 1.000000 | 0.000000 |
| MOTOR VEHICLE THEFT | 0.989754 | 0.010246 |
| NARCOTICS | 1.000000 | 0.000000 |
| NON-CRIMINAL | 1.000000 | 0.000000 |
| OBSCENITY | 0.000000 | 1.000000 |
| OFFENSE INVOLVING CHILDREN | 0.505051 | 0.494949 |
| OTHER OFFENSE | 0.675299 | 0.324701 |
| PROSTITUTION | 1.000000 | 0.000000 |
| PUBLIC PEACE VIOLATION | 0.984375 | 0.015625 |
| ROBBERY | 0.979008 | 0.020992 |
| SEX OFFENSE | 0.931818 | 0.068182 |
| STALKING | 0.777778 | 0.222222 |
| THEFT | 0.963785 | 0.036215 |
| WEAPONS VIOLATION | 1.000000 | 0.000000 |
Lets start executing different Queries¶
Query #1:¶
- Calculate the total number of crimes in every district and plot that on Choropleth map. In other words, create a Choropleth map where the districts are shaded in proportion to the number of crimes in that district.
# Get the total number of crimes per district.
# Since each row corresponds to a crime, we just need to count the number of rows per district.
cursor.execute("SELECT district, count(district) \
FROM crimes \
GROUP BY district \
ORDER BY district" )
rows = cursor.fetchall()
rows
[(1, 854), (2, 602), (3, 695), (4, 790), (5, 635), (6, 844), (7, 745), (8, 886), (9, 585), (10, 779), (11, 1113), (12, 700), (14, 508), (15, 530), (16, 473), (17, 386), (18, 869), (19, 608), (20, 254), (22, 442), (24, 412), (25, 726)]
# Create a dataframe from the results of the query.
crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_crimes'])
crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)
crimes_per_district
| dist_num | number_of_crimes | |
|---|---|---|
| 0 | 1 | 854 |
| 1 | 2 | 602 |
| 2 | 3 | 695 |
| 3 | 4 | 790 |
| 4 | 5 | 635 |
| 5 | 6 | 844 |
| 6 | 7 | 745 |
| 7 | 8 | 886 |
| 8 | 9 | 585 |
| 9 | 10 | 779 |
| 10 | 11 | 1113 |
| 11 | 12 | 700 |
| 12 | 14 | 508 |
| 13 | 15 | 530 |
| 14 | 16 | 473 |
| 15 | 17 | 386 |
| 16 | 18 | 869 |
| 17 | 19 | 608 |
| 18 | 20 | 254 |
| 19 | 22 | 442 |
| 20 | 24 | 412 |
| 21 | 25 | 726 |
# For each district, excluding HQ, we get the latitude/longitude of the police station at the district.
# With a few exceptions, will be inserting markers on the map at these police station locations
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district, district_name \
FROM police_stations \
WHERE district!='Headquarters'""")
police_stations = cursor.fetchall()
# Create a dataframe with this information.
station_locations = pd.DataFrame(police_stations,columns=['st_latitude','st_longitude','dist_num', 'district_name']).drop_duplicates()
station_locations
| st_latitude | st_longitude | dist_num | district_name | |
|---|---|---|---|---|
| 0 | 41.858373 | -87.627356 | 1 | Central |
| 1 | 41.801811 | -87.630560 | 2 | Wentworth |
| 2 | 41.766431 | -87.605748 | 3 | Grand Crossing |
| 3 | 41.707933 | -87.568349 | 4 | South Chicago |
| 4 | 41.692723 | -87.604506 | 5 | Calumet |
| 5 | 41.752137 | -87.644229 | 6 | Gresham |
| 6 | 41.779632 | -87.660887 | 7 | Englewood |
| 7 | 41.778987 | -87.708864 | 8 | Chicago Lawn |
| 8 | 41.837394 | -87.646408 | 9 | Deering |
| 9 | 41.856685 | -87.708382 | 10 | Ogden |
| 10 | 41.873582 | -87.705488 | 11 | Harrison |
| 11 | 41.862977 | -87.656973 | 12 | Near West |
| 12 | 41.921103 | -87.697452 | 14 | Shakespeare |
| 13 | 41.880083 | -87.768200 | 15 | Austin |
| 14 | 41.974094 | -87.766149 | 16 | Jefferson Park |
| 15 | 41.966053 | -87.728115 | 17 | Albany Park |
| 16 | 41.903242 | -87.643352 | 18 | Near North |
| 17 | 41.947400 | -87.651512 | 19 | Town Hall |
| 18 | 41.979550 | -87.692845 | 20 | Lincoln |
| 19 | 41.691435 | -87.668520 | 22 | Morgan Park |
| 20 | 41.999763 | -87.671324 | 24 | Rogers Park |
| 21 | 41.918609 | -87.765574 | 25 | Grand Central |
# Add columns for the coordinates of each distract. Again, this is going to be the locations of our markers.
query1_df = pd.merge(crimes_per_district, station_locations,on='dist_num')
query1_df
| dist_num | number_of_crimes | st_latitude | st_longitude | district_name | |
|---|---|---|---|---|---|
| 0 | 1 | 854 | 41.858373 | -87.627356 | Central |
| 1 | 2 | 602 | 41.801811 | -87.630560 | Wentworth |
| 2 | 3 | 695 | 41.766431 | -87.605748 | Grand Crossing |
| 3 | 4 | 790 | 41.707933 | -87.568349 | South Chicago |
| 4 | 5 | 635 | 41.692723 | -87.604506 | Calumet |
| 5 | 6 | 844 | 41.752137 | -87.644229 | Gresham |
| 6 | 7 | 745 | 41.779632 | -87.660887 | Englewood |
| 7 | 8 | 886 | 41.778987 | -87.708864 | Chicago Lawn |
| 8 | 9 | 585 | 41.837394 | -87.646408 | Deering |
| 9 | 10 | 779 | 41.856685 | -87.708382 | Ogden |
| 10 | 11 | 1113 | 41.873582 | -87.705488 | Harrison |
| 11 | 12 | 700 | 41.862977 | -87.656973 | Near West |
| 12 | 14 | 508 | 41.921103 | -87.697452 | Shakespeare |
| 13 | 15 | 530 | 41.880083 | -87.768200 | Austin |
| 14 | 16 | 473 | 41.974094 | -87.766149 | Jefferson Park |
| 15 | 17 | 386 | 41.966053 | -87.728115 | Albany Park |
| 16 | 18 | 869 | 41.903242 | -87.643352 | Near North |
| 17 | 19 | 608 | 41.947400 | -87.651512 | Town Hall |
| 18 | 20 | 254 | 41.979550 | -87.692845 | Lincoln |
| 19 | 22 | 442 | 41.691435 | -87.668520 | Morgan Park |
| 20 | 24 | 412 | 41.999763 | -87.671324 | Rogers Park |
| 21 | 25 | 726 | 41.918609 | -87.765574 | Grand Central |
Creating the folium maps¶
# Create a folium map centered on "downtown Chicago"
total_number_of_crimes_per_district_map = folium.Map(location = (41.8781, -87.6298),zoom_start = 11)
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district.
# We add this "choropleth layer" to our folium map.
folium.Choropleth(geo_data = "Boundaries.geojson",
fill_color = 'OrRd',
fill_opacity =0.5,
line_opacity= 1,
data = crimes_per_district,
key_on = 'feature.properties.dist_num',
columns = ['dist_num', 'number_of_crimes'],
legend_name = "CRIMES PER DISTRICT"
).add_to(total_number_of_crimes_per_district_map)
<folium.features.Choropleth at 0x205094867e0>
# We iterate over the rows in the dataframe creating one marker per row.
# The ("popup") label for each district marker displays the district number and number of crimes.
# Each distric marker is added to the map at the district station's location.
for index, row in query1_df.iterrows():
# location of police station
police_station_location = (row['st_latitude'],row['st_longitude'])
district = row['dist_num']
num_crimes = row['number_of_crimes']
# Create the marker and add it to the map.
folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} has Total Number of Crimes: {num_crimes}",max_width=450)).\
add_to(total_number_of_crimes_per_district_map)
- Lets plot the Choropleth map and notice the intensity of color on the different districts
- The Blue POPUP represents the location of police station in the different districts in the map
# Display the finished map.
total_number_of_crimes_per_district_map
Query #2:¶
- Calculate the total number of violent crimes in every district and plot that in a table on Choropleth map. In other words, create a Choropleth map where the districts are shaded in proportion to the number of violent crimes in that district.
Note: A crime is considered a violent crime if the PRIMARY_TYPE of the crimes is THEFT ,ASSAULT,ROBBERY,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or MURDER.
- Then find the total number of crimes in the district for each of these primary types of violent crime and add a popup marker (located at that district's police headquarter) that displays a DataFrame containing this data.
violent_crime_categories = 'THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL ASSAULT','BATTERY','MURDER'
# Get the total number of violent crimes per district. This info will populate the legend.
cursor.execute("SELECT district, count(district)\
FROM crimes \
WHERE PRIMARY_TYPE in %s \
GROUP BY district \
ORDER BY district", [violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['dist_num','number_of_violent_crimes'])
violent_crime_data['dist_num'] = violent_crime_data['dist_num'].astype(str)
violent_crime_data
| dist_num | number_of_violent_crimes | |
|---|---|---|
| 0 | 1 | 624 |
| 1 | 2 | 349 |
| 2 | 3 | 387 |
| 3 | 4 | 400 |
| 4 | 5 | 293 |
| 5 | 6 | 457 |
| 6 | 7 | 367 |
| 7 | 8 | 488 |
| 8 | 9 | 312 |
| 9 | 10 | 362 |
| 10 | 11 | 407 |
| 11 | 12 | 430 |
| 12 | 14 | 309 |
| 13 | 15 | 291 |
| 14 | 16 | 252 |
| 15 | 17 | 220 |
| 16 | 18 | 605 |
| 17 | 19 | 368 |
| 18 | 20 | 147 |
| 19 | 22 | 217 |
| 20 | 24 | 216 |
| 21 | 25 | 362 |
Now, lets create a dataframe of the different types of violent crimes for every district
cursor.execute(f"SELECT district, PRIMARY_TYPE, count(PRIMARY_TYPE) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} \
GROUP BY district,PRIMARY_TYPE \
ORDER BY district,PRIMARY_TYPE")
rows=cursor.fetchall()
violent_crime_type = pd.DataFrame(rows, columns=['dist_num','Description','number_of_violent_crimes'])
violent_crime_type['dist_num'] = violent_crime_type['dist_num'].astype(str)
violent_crime_type
| dist_num | Description | number_of_violent_crimes | |
|---|---|---|---|
| 0 | 1 | ASSAULT | 46 |
| 1 | 1 | BATTERY | 93 |
| 2 | 1 | CRIM SEXUAL ASSAULT | 2 |
| 3 | 1 | ROBBERY | 19 |
| 4 | 1 | THEFT | 464 |
| ... | ... | ... | ... |
| 108 | 25 | ASSAULT | 56 |
| 109 | 25 | BATTERY | 132 |
| 110 | 25 | CRIM SEXUAL ASSAULT | 4 |
| 111 | 25 | ROBBERY | 27 |
| 112 | 25 | THEFT | 143 |
113 rows × 3 columns
# This time our dataframe will contain one row for each district and each type of violent crime.
# For each distrct number and type of violent crime our dataframe will have the total of violent crimes of that type.
# We once again include the stations location since that is where the markers will appear on the map.
query2_df = pd.merge(violent_crime_type, station_locations,on='dist_num')
query2_df
| dist_num | Description | number_of_violent_crimes | st_latitude | st_longitude | district_name | |
|---|---|---|---|---|---|---|
| 0 | 1 | ASSAULT | 46 | 41.858373 | -87.627356 | Central |
| 1 | 1 | BATTERY | 93 | 41.858373 | -87.627356 | Central |
| 2 | 1 | CRIM SEXUAL ASSAULT | 2 | 41.858373 | -87.627356 | Central |
| 3 | 1 | ROBBERY | 19 | 41.858373 | -87.627356 | Central |
| 4 | 1 | THEFT | 464 | 41.858373 | -87.627356 | Central |
| ... | ... | ... | ... | ... | ... | ... |
| 108 | 25 | ASSAULT | 56 | 41.918609 | -87.765574 | Grand Central |
| 109 | 25 | BATTERY | 132 | 41.918609 | -87.765574 | Grand Central |
| 110 | 25 | CRIM SEXUAL ASSAULT | 4 | 41.918609 | -87.765574 | Grand Central |
| 111 | 25 | ROBBERY | 27 | 41.918609 | -87.765574 | Grand Central |
| 112 | 25 | THEFT | 143 | 41.918609 | -87.765574 | Grand Central |
113 rows × 6 columns
Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson". Use this data to draw the district regions on the folium map. Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes. Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district. We add this "chropleth layer" to our folium map.
Creating the folium maps¶
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district.
# We add this "choropleth layer" to our folium map
violent_crimes_per_district_map= folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = violent_crime_data, #data source changed from first example
key_on='feature.properties.dist_num',
columns = ['dist_num', 'number_of_violent_crimes'],
legend_name = "VIOLENT CRIMES PER DISTRICT"
).add_to(violent_crimes_per_district_map)
<folium.features.Choropleth at 0x205066344a0>
In addition, for each district find the block(s) that has the highest number of gun crimes in that district. Note that there might be a tie for the highest number of gun crimes. You need to find all such blocks. Add a popup marker (located at that district's police headquarter) that displays a DataFrame containing all such block along with the number of gun crimes for that block (i.e. the highest number of crimes for a district).
# We iterate over district numbers and get all the rows in query2_df with that district number.
# This new dataframe contains the number of crimes for each of violent crime types in a particular district.
# We convert this dataframe to "html" which will be displayed in marker label (along with the district number)
# Each distric marker is added to the map at the district station's location like before.
#districts = query2_df.dist_num.unique()
for district in districts:
df2d = query2_df[query2_df.dist_num == district] # new dataframe to add to marker.
police_station_location = tuple(df2d[['st_latitude','st_longitude']].values[0])
violent_crimes_per_district_df = df2d[['Description','number_of_violent_crimes']]
# Convert out new data frame to html before inserting it into the label.
header = violent_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
# Create the folium marker and add it to the map.
folium.Marker(location=police_station_location, popup=folium.\
Popup(html=f"District Number {district} - Violent Crimes {header}")).\
add_to(violent_crimes_per_district_map)
# Create the marker and add it to the map.
folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} - Violent Crimes {header}",max_width=450)).\
add_to(violent_crimes_per_district_map)
# Display the finished map.
violent_crimes_per_district_map
Query #3:¶
- Calculate the total number of gun related violent crimes in every district and plot that in a table on Choropleth map.
Note: A crime is considered a gun related violent crime if the word "gun" is contained in the DESCRIPTION and the PRIMARY_TYPE of the crimes is THEFT ,ASSAULT,ROBBER,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or MURDER.
- Then find the total number of crimes in the district for the different DESCRIPTIONs containing the word "gun" and add a popup marker (located at that district's police headquarter) that displays a DataFrame containing this data.
Lets first create a dataframe of gun crimes per district first to get an idea about the number of gun crimes per district
# isolate violent gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE PRIMARY_TYPE in {violent_crime_categories}\
AND DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_violent_gun_crimes = cursor.fetchall()
districts_violent_gun_crimes_df = pd.DataFrame(districts_violent_gun_crimes, columns=['dist_num','violent_gun_crimes'])
districts_violent_gun_crimes_df['dist_num'] = districts_violent_gun_crimes_df['dist_num'].astype(str)
districts_violent_gun_crimes_df.head()
| dist_num | violent_gun_crimes | |
|---|---|---|
| 0 | 24 | 9 |
| 1 | 8 | 38 |
| 2 | 11 | 36 |
| 3 | 19 | 9 |
| 4 | 25 | 25 |
Now, lets create a dataframe of the different types of violent gun crimes for every district
cursor.execute(f"SELECT district, DESCRIPTION, count(DESCRIPTION) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} AND DESCRIPTION::text LIKE '{gun}' \
GROUP BY district, DESCRIPTION")
rows=cursor.fetchall()
violent_gun_crime_type = pd.DataFrame(rows, columns=['dist_num','Description','number_of_violent_gun_crimes'])
violent_gun_crime_type['dist_num'] = violent_gun_crime_type['dist_num'].astype(str)
violent_gun_crime_type.head()
| dist_num | Description | number_of_violent_gun_crimes | |
|---|---|---|---|
| 0 | 8 | ARMED: HANDGUN | 23 |
| 1 | 11 | AGGRAVATED: HANDGUN | 15 |
| 2 | 6 | AGGRAVATED: HANDGUN | 15 |
| 3 | 20 | ATTEMPT: ARMED-HANDGUN | 1 |
| 4 | 19 | AGGRAVATED: HANDGUN | 5 |
query3_df = pd.merge(violent_gun_crime_type, station_locations, on = 'dist_num')
query3_df.head()
| dist_num | Description | number_of_violent_gun_crimes | st_latitude | st_longitude | district_name | |
|---|---|---|---|---|---|---|
| 0 | 8 | ARMED: HANDGUN | 23 | 41.778987 | -87.708864 | Chicago Lawn |
| 1 | 11 | AGGRAVATED: HANDGUN | 15 | 41.873582 | -87.705488 | Harrison |
| 2 | 6 | AGGRAVATED: HANDGUN | 15 | 41.752137 | -87.644229 | Gresham |
| 3 | 20 | ATTEMPT: ARMED-HANDGUN | 1 | 41.979550 | -87.692845 | Lincoln |
| 4 | 19 | AGGRAVATED: HANDGUN | 5 | 41.947400 | -87.651512 | Town Hall |
Creating the folium maps¶
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district.
# We add this "choropleth layer" to our folium map.# Create a folium map centered on "downtown Chicago"
districts_violent_gun_crimes_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_violent_gun_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'violent_gun_crimes'],
legend_name="VIOLENT GUN CRIME"
).add_to(districts_violent_gun_crimes_map)
<folium.features.Choropleth at 0x2050a3ab650>
# districts = query3_df.dist_num.unique()
for district in districts:
df3d = query3_df[query3_df.dist_num == district]
police_station_location = tuple(df3d[['st_latitude','st_longitude']].values[0])
violent_gun_crimes_per_district_df = df3d[['Description','number_of_violent_gun_crimes']]
header = violent_gun_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
#folium.Marker(location=police_station_location, popup=folium.\
# Popup(html=f"District Number {district} - Violent GUN Crimes {header}")).\
# add_to(districts_violent_gun_crimes_map)
folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} - Violent Gun Crimes {header}",max_width=450)).\
add_to(districts_violent_gun_crimes_map)
districts_violent_gun_crimes_map
Query #4:¶
- Calculate the crime density per district
district = []
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features'] # a is a list of district data (dictionaries)
for i in range(len(a)):
obj = a[i]['geometry'] # list of coordinates defining the ith district boundary
n = a[i]['properties'] # dictionary with district number and district label keyes for ith district
district.append(n['dist_num']) # add district number to the district list
tarea.append(area(obj)/10000) # add the the area (in hectares) to area list
af = pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/(final_data['district_area_inHectares']/100))
final_data
| dist_num | district_area_inHectares | number_of_crimes | crime_density | |
|---|---|---|---|---|
| 0 | 17 | 2492.727155 | 386 | 15.0 |
| 1 | 20 | 1132.170216 | 254 | 22.0 |
| 2 | 19 | 2225.035732 | 608 | 27.0 |
| 3 | 25 | 2827.989237 | 726 | 26.0 |
| 4 | 14 | 1555.869965 | 508 | 33.0 |
| 5 | 7 | 1688.670732 | 745 | 44.0 |
| 6 | 3 | 1576.063931 | 695 | 44.0 |
| 7 | 4 | 7068.152865 | 790 | 11.0 |
| 8 | 6 | 2099.682124 | 844 | 40.0 |
| 9 | 22 | 3490.416073 | 442 | 13.0 |
| 10 | 5 | 3318.613379 | 635 | 19.0 |
| 11 | 24 | 1406.081387 | 412 | 29.0 |
| 12 | 16 | 8171.776367 | 473 | 6.0 |
| 13 | 8 | 5992.169760 | 886 | 15.0 |
| 14 | 18 | 1215.520046 | 869 | 71.0 |
| 15 | 12 | 2509.453028 | 700 | 28.0 |
| 16 | 11 | 1582.727274 | 1113 | 70.0 |
| 17 | 15 | 989.631393 | 530 | 54.0 |
| 18 | 10 | 2038.988883 | 779 | 38.0 |
| 19 | 1 | 1214.818895 | 854 | 70.0 |
| 20 | 9 | 3505.216898 | 585 | 17.0 |
| 21 | 2 | 1949.690970 | 602 | 31.0 |
Query #5:¶
- Create Marker Clusters on Choropleth map for those gun related crimes that resulted in arrest (green icon) and those that didn't (red icon)
# isolate gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_gun_crimes = cursor.fetchall()
districts_gun_crimes_df = pd.DataFrame(districts_gun_crimes, columns=['dist_num','gun_crimes'])
districts_gun_crimes_df['dist_num'] = districts_gun_crimes_df['dist_num'].astype(str)
districts_gun_crimes_df.head()
| dist_num | gun_crimes | |
|---|---|---|
| 0 | 24 | 9 |
| 1 | 8 | 49 |
| 2 | 11 | 69 |
| 3 | 19 | 11 |
| 4 | 25 | 43 |
# This time our dataframe will contain one row for each district and each block in the district.
# Each row contains information about a particular gun crime including the block in which the crime occured.
# We also save the location of the crime incidendent and whether there was an arrest.
# We will use this information when creating the markers.
# In particular, markers will be positioned at the crime location instead of police station location.
data= []
for district in districts:
cursor.execute(f"""SELECT DISTINCT ON(caseno) caseno, block, DESCRIPTION, arrest,latitude,longitude\
FROM crimes\
WHERE district = {district} and DESCRIPTION::text LIKE '{gun}'""")
results = cursor.fetchall()
# insert the district number as the first element of each result list
gun_crimes_per_district = [[district]+list(result) for result in results]
data += (gun_crimes_per_district)
gun_crimes_per_district_df = pd.DataFrame(data, columns=['dist_num','caseno','block',\
'Description','arrest', 'latitude', 'longitude'])
gun_crimes_per_district_df['dist_num'] = gun_crimes_per_district_df['dist_num'].astype(str)
gun_crimes_per_district_df.shape
(745, 7)
gun_crimes_per_district_df.head()
| dist_num | caseno | block | Description | arrest | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | JC105034 | 004XX W VAN BUREN ST | AGGRAVATED: HANDGUN | False | 41.876794 | -87.637817 |
| 1 | 1 | JC111601 | 008XX S PARK TER | ATTEMPT: ARMED-HANDGUN | False | 41.871096 | -87.629570 |
| 2 | 1 | JC111881 | 0000X W QUINCY ST | ATTEMPT: ARMED-HANDGUN | False | 41.878802 | -87.627917 |
| 3 | 1 | JC118978 | 022XX S MICHIGAN AVE | AGGRAVATED: HANDGUN | True | 41.852284 | -87.623790 |
| 4 | 1 | JC119606 | 029XX S DEARBORN ST | UNLAWFUL POSS OF HANDGUN | True | 41.841856 | -87.628742 |
Creating the folium maps¶
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district.
# We add this "choropleth layer" to our folium map.
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_crimes_df, #using data from Query 3
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
).add_to(gun_crime_arrests_map)
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
# We create a marker for each gun crime--"green" if there was an arrest and "red" otherwise.
gun='%GUN%'
for district in districts:
gun_crimes_per_district_df2 = gun_crimes_per_district_df[gun_crimes_per_district_df.dist_num==district]
for index, row in gun_crimes_per_district_df2.iterrows():
description = row['Description']
block = row['block']
loc = (row['latitude'],row['longitude'])
# print("District No: %s <br> Description: %s <br> Block: %s" %(district,description,block))
if row['arrest']==True:
folium.Marker(location=loc,popup = folium.Popup(html=\
f"District No: {district} <br> Description: {description} <br> Block: {block}"),\
icon=folium.Icon(color='green', icon='ok-sign'),).add_to(marker_cluster)
# print(f"District {district} green marker at {loc}")
else:
folium.Marker(location=loc, popup = folium.Popup(html=
f"District No: {district} <br> Description: {description} <br> Block: {block}"),\
icon=folium.Icon(color='red',icon='remove-sign'),).add_to(marker_cluster)
gun_crime_arrests_map
Query #6:¶
- Plot on Choropleth map the farthest Block that has a gun crime from every police station in every district
Locate the farthest gun crime from the police station in every district. Create a Choropleth map where the districts are shaded in proportion to the number of gun crimes in that district. For each district, find the gun crime that was farthest from police station. Add a pop-up on the Choropleth map to display the district number and the Block where the farthest gun crime occurred. Also add circle marker (of radius 5) at the location of the farthest gun crime.
# This complex query returns for each block in each district the gun crime which occured furthest
# from the police station. We also store this distance together with the police station location.
# Once again we will be positioning the markers at the police station locations.
results= []
for district in districts:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s
and ST_Distance(A.where_is,B.where_is)
IN
( SELECT max(dist)
FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",
[district,gun,district,district,gun, district])
farthest_block_gun_crime = list(cursor.fetchall()[0])
cursor.execute(f"SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(farthest_block_gun_crime[2],farthest_block_gun_crime[2]))
farthest_block_gun_crime[2:3] = cursor.fetchall()[0]
results.append(farthest_block_gun_crime)
farthest_gun_crime_df = pd.DataFrame(results,columns=['dist_num','block','crime_lat','crime_long','dist'])
farthest_gun_crime_df['dist_num']= farthest_gun_crime_df['dist_num'].astype(str)
query6_df = pd.merge(farthest_gun_crime_df, station_locations,on='dist_num')
query6_df.head()
| dist_num | block | crime_lat | crime_long | dist | st_latitude | st_longitude | district_name | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 004XX W VAN BUREN ST | 41.876794 | -87.637817 | 1171.461889 | 41.858373 | -87.627356 | Central |
| 1 | 2 | 055XX S EVERETT AVE | 41.794425 | -87.582519 | 5365.880774 | 41.801811 | -87.630560 | Wentworth |
| 2 | 3 | 026XX E 75TH ST | 41.759100 | -87.560145 | 5093.598187 | 41.766431 | -87.605748 | Grand Crossing |
| 3 | 4 | 101XX S EWING AVE | 41.711374 | -87.535281 | 3693.438062 | 41.707933 | -87.568349 | South Chicago |
| 4 | 5 | 118XX S LAFLIN ST | 41.678829 | -87.658995 | 6086.359525 | 41.692723 | -87.604506 | Calumet |
Creating the folium maps¶
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column of crimes_per_district.
# We add this "choropleth layer" to our folium map.
farthest_block_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
).add_to(farthest_block_gun_crime_map)
<folium.features.Choropleth at 0x20508b58110>
# This time we place a circle markers at the crime locations in addition to the "regular" markers at the
# police station locations.
for district in districts:
data = query6_df.loc[query6_df.dist_num==district]
# print(data)
# print(data['block'].values)
block, crime_lat, crime_long, dist, police_lat, police_long =\
data['block'].values[0], data['crime_lat'].values[0], data['crime_long'].values[0],\
data['dist'].values[0], data['st_latitude'].values[0], data['st_longitude'].values[0]
folium.CircleMarker((crime_lat,crime_long),radius=5,color='#ff3187',
popup=folium.Popup(html=f"District No.: {district} <br> Block:{block}")).\
add_to(farthest_block_gun_crime_map)
folium.Marker(location=(police_lat,police_long),
popup=folium.Popup(\
html=f"Police Station <br> District No.: {district} <br> Farthest Gun_Crime Block:{block}")).\
add_to(farthest_block_gun_crime_map)
farthest_block_gun_crime_map
Requirements¶
** The HTML document your are submitting must have the source code and the output for the following requirements **
Requirement #1: (15 points)¶
- Use SQL to select all rows of data with the following fields from crimes - district, ward, arrest, primary_type and location_description.
- Create a dataframe called new_df with the selected fields.
- Make sure you name the columns in the dataframe.
- Show your new_df using info() and head().
# SQL query to select the desired columns from the 'crimes' table
query = '''
SELECT district, ward, arrest, primary_type, location_description
FROM crimes
'''
# Execute the query and create a DataFrame
new_df = pd.read_sql_query(query, db_connection)
# Display the DataFrame information and the first few rows
print(new_df.info())
print(new_df.head())
C:\Users\Ritesh Kumar\AppData\Local\Temp\ipykernel_2512\2830133133.py:8: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. new_df = pd.read_sql_query(query, db_connection)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14436 entries, 0 to 14435 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 district 14436 non-null int64 1 ward 14436 non-null float64 2 arrest 14436 non-null bool 3 primary_type 14436 non-null object 4 location_description 14436 non-null object dtypes: bool(1), float64(1), int64(1), object(2) memory usage: 465.4+ KB None district ward arrest primary_type location_description 0 2 3.0 False HOMICIDE AUTO 1 2 3.0 False HOMICIDE AUTO 2 14 2.0 False HOMICIDE STREET 3 8 13.0 False HOMICIDE STREET 4 6 21.0 True HOMICIDE PARKING LOT
Requirement #2: (10 points)¶
- Using new_df, show a crosstab of primary_type with the variable called arrest; Show the results as percentages.
- Use a Markdown cell to comment on what the results for the crime type of Assault tell us.
# Calculate the crosstab and normalize by row to get percentages
crosstab_percentages = pd.crosstab(new_df['primary_type'], new_df['arrest'], normalize='index') * 100
# Rename the columns for clarity
crosstab_percentages.columns = ['No Arrest (%)', 'Arrest (%)']
# Sort the DataFrame by 'Arrest (%)' in descending order
crosstab_percentages_sorted = crosstab_percentages.sort_values('Arrest (%)', ascending=False)
# Name the index for clarity, which effectively names the first column
crosstab_percentages_sorted.index.name = 'Primary Crime'
# Display the sorted crosstab with percentages
print(crosstab_percentages_sorted)
No Arrest (%) Arrest (%) Primary Crime GAMBLING 0.000000 100.000000 PROSTITUTION 0.000000 100.000000 OBSCENITY 0.000000 100.000000 CONCEALED CARRY LICENSE VIOLATION 0.000000 100.000000 NARCOTICS 0.000000 100.000000 LIQUOR LAW VIOLATION 0.000000 100.000000 INTERFERENCE WITH PUBLIC OFFICER 4.938272 95.061728 PUBLIC PEACE VIOLATION 26.562500 73.437500 WEAPONS VIOLATION 32.258065 67.741935 CRIMINAL TRESPASS 40.780911 59.219089 OTHER OFFENSE 76.792829 23.207171 STALKING 77.777778 22.222222 BATTERY 79.823918 20.176082 ASSAULT 80.659945 19.340055 HOMICIDE 86.666667 13.333333 THEFT 89.836449 10.163551 SEX OFFENSE 93.181818 6.818182 CRIMINAL DAMAGE 95.264242 4.735758 ROBBERY 95.419847 4.580153 CRIM SEXUAL ASSAULT 95.744681 4.255319 MOTOR VEHICLE THEFT 95.901639 4.098361 DECEPTIVE PRACTICE 95.949367 4.050633 OFFENSE INVOLVING CHILDREN 97.979798 2.020202 BURGLARY 99.058085 0.941915 ARSON 100.000000 0.000000 NON-CRIMINAL 100.000000 0.000000 INTIMIDATION 100.000000 0.000000 HUMAN TRAFFICKING 100.000000 0.000000 KIDNAPPING 100.000000 0.000000
The 'Assault' data indicates that only about 19.34% of Assault cases result in an arrest, suggesting challenges in the investigation and prosecution of these incidents. This low arrest rate could reflect difficulties in gathering evidence, victim reluctance to press charges, or prioritization of resources by law enforcement.
Requirement #3: (10 points)¶
- Calculate the gun crimes density in every district. (See Query 4.)
# Merge the dataframes on 'dist_num'
gun_density_data = pd.merge(final_data, districts_gun_crimes_df, on='dist_num', how='inner')
# Calculate crime density
gun_density_data['gun_crime_density'] = round(gun_density_data['gun_crimes'] / (gun_density_data['district_area_inHectares'] / 100), 5)
# Sort by gun_crime_density
gun_density_data_sorted = gun_density_data.sort_values('gun_crime_density', ascending=False)
# Display the final DataFrame
gun_density_data_sorted [['dist_num','gun_crime_density']]
| dist_num | gun_crime_density | |
|---|---|---|
| 16 | 11 | 4.35956 |
| 5 | 7 | 3.96762 |
| 6 | 3 | 3.61660 |
| 17 | 15 | 3.43562 |
| 8 | 6 | 3.00045 |
| 18 | 10 | 2.84455 |
| 21 | 2 | 1.74387 |
| 3 | 25 | 1.52051 |
| 15 | 12 | 1.51427 |
| 10 | 5 | 1.32585 |
| 14 | 18 | 1.15177 |
| 20 | 9 | 1.11263 |
| 13 | 8 | 0.81773 |
| 4 | 14 | 0.77127 |
| 7 | 4 | 0.70740 |
| 9 | 22 | 0.68760 |
| 11 | 24 | 0.64008 |
| 2 | 19 | 0.49437 |
| 19 | 1 | 0.49390 |
| 1 | 20 | 0.44163 |
| 0 | 17 | 0.28082 |
| 12 | 16 | 0.14685 |
- Create the data frame with the appropriate marker data.
- Create the folium base map.
- Add the Choropleth layer.
- Iterate over the data frame you created to place the markers on the folium map.
- Show the final map.
Requirement #4: (15 points)¶
- Using Query #2 as an example, create a Choropleth map for the violent crimes that resulted in an arrest.
- The pins should be placed at the police station for each district.
- The legend should reflect the total violent crime count per district.
- The popup should show the number of total violent crimes per hour interval for that district. If you choose to extract the hour from date_of_occurrence using SQL, check out https://www.postgresql.org/docs/current/functions-datetime.html.
# Run the SQL Query
cursor.execute("""SELECT district, \
date_part('hour', date_of_occurrence) AS hour, \
COUNT(caseno) AS violent_crimes_arrests \
FROM crimes \
WHERE arrest = 'TRUE'
AND primary_type IN ('THEFT', 'ASSAULT', 'ROBBERY', 'KIDNAPPING', 'CRIM SEXUAL ASSAULT', 'BATTERY', 'MURDER') \
GROUP BY district, hour \
ORDER BY district, hour;""")
rows = cursor.fetchall()
# Create the dataframe
violent_crime_arrest_data = pd.DataFrame(rows, columns=['dist_num', 'hour' ,'number_of_violent_crimes_arrest'])
violent_crime_arrest_data.head(25)
| dist_num | hour | number_of_violent_crimes_arrest | |
|---|---|---|---|
| 0 | 1 | 0.0 | 2 |
| 1 | 1 | 2.0 | 1 |
| 2 | 1 | 3.0 | 1 |
| 3 | 1 | 8.0 | 3 |
| 4 | 1 | 9.0 | 1 |
| 5 | 1 | 10.0 | 3 |
| 6 | 1 | 11.0 | 3 |
| 7 | 1 | 12.0 | 4 |
| 8 | 1 | 13.0 | 4 |
| 9 | 1 | 14.0 | 6 |
| 10 | 1 | 15.0 | 7 |
| 11 | 1 | 16.0 | 10 |
| 12 | 1 | 17.0 | 8 |
| 13 | 1 | 18.0 | 5 |
| 14 | 1 | 19.0 | 4 |
| 15 | 1 | 20.0 | 11 |
| 16 | 1 | 21.0 | 2 |
| 17 | 1 | 23.0 | 2 |
| 18 | 2 | 1.0 | 2 |
| 19 | 2 | 2.0 | 1 |
| 20 | 2 | 3.0 | 1 |
| 21 | 2 | 4.0 | 1 |
| 22 | 2 | 5.0 | 1 |
| 23 | 2 | 6.0 | 1 |
| 24 | 2 | 7.0 | 3 |
# Cast 'dist_num' to a string
violent_crime_arrest_data['dist_num'] = violent_crime_arrest_data['dist_num'].astype(str)
# Merge dataframes
r4_df = pd.merge(violent_crime_arrest_data, station_locations, on='dist_num')
r4_df.head(20)
| dist_num | hour | number_of_violent_crimes_arrest | st_latitude | st_longitude | district_name | |
|---|---|---|---|---|---|---|
| 0 | 1 | 0.0 | 2 | 41.858373 | -87.627356 | Central |
| 1 | 1 | 2.0 | 1 | 41.858373 | -87.627356 | Central |
| 2 | 1 | 3.0 | 1 | 41.858373 | -87.627356 | Central |
| 3 | 1 | 8.0 | 3 | 41.858373 | -87.627356 | Central |
| 4 | 1 | 9.0 | 1 | 41.858373 | -87.627356 | Central |
| 5 | 1 | 10.0 | 3 | 41.858373 | -87.627356 | Central |
| 6 | 1 | 11.0 | 3 | 41.858373 | -87.627356 | Central |
| 7 | 1 | 12.0 | 4 | 41.858373 | -87.627356 | Central |
| 8 | 1 | 13.0 | 4 | 41.858373 | -87.627356 | Central |
| 9 | 1 | 14.0 | 6 | 41.858373 | -87.627356 | Central |
| 10 | 1 | 15.0 | 7 | 41.858373 | -87.627356 | Central |
| 11 | 1 | 16.0 | 10 | 41.858373 | -87.627356 | Central |
| 12 | 1 | 17.0 | 8 | 41.858373 | -87.627356 | Central |
| 13 | 1 | 18.0 | 5 | 41.858373 | -87.627356 | Central |
| 14 | 1 | 19.0 | 4 | 41.858373 | -87.627356 | Central |
| 15 | 1 | 20.0 | 11 | 41.858373 | -87.627356 | Central |
| 16 | 1 | 21.0 | 2 | 41.858373 | -87.627356 | Central |
| 17 | 1 | 23.0 | 2 | 41.858373 | -87.627356 | Central |
| 18 | 2 | 1.0 | 2 | 41.801811 | -87.630560 | Wentworth |
| 19 | 2 | 2.0 | 1 | 41.801811 | -87.630560 | Wentworth |
# Step 1: Calculate the total number of violent crimes arrests per district
total_crimes_per_district = r4_df.groupby('dist_num')['number_of_violent_crimes_arrest'].sum().reset_index()
total_crimes_per_district.columns = ['dist_num', 'total_violent_crimes_arrests']
# Step 2: Create a folium map with choropleth layers and markers
violent_crimes_arrests_per_district_map = folium.Map(location=(41.8781, -87.6298), zoom_start=11)
folium.Choropleth(
geo_data="Boundaries.geojson",
name="choropleth",
data=total_crimes_per_district,
columns=['dist_num', 'total_violent_crimes_arrests'],
key_on='feature.properties.dist_num', # Ensure this key matches the GeoJSON properties
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Total Number of Violent Crimes Arrests per District"
).add_to(violent_crimes_arrests_per_district_map)
# Step 3: Update the folium popups to include a table with the number of crimes for every hour for each district
for district in r4_df['dist_num'].unique():
# Filter the data for this district
district_data = r4_df[r4_df['dist_num'] == district]
# If district name is a column in the DataFrame
district_name = district_data['district_name'].iloc[0] if 'district_name' in district_data.columns else 'Unknown'
total_violent_crimes_arrests = total_crimes_per_district[total_crimes_per_district['dist_num'] == district]['total_violent_crimes_arrests'].iloc[0]
# Create an HTML table for crimes by hour
hourly_crimes_html = district_data[['hour', 'number_of_violent_crimes_arrest']].sort_values('hour').to_html(classes='table table-striped table-hover table-condensed table-responsive', index=False, border=0)
# Prepare the HTML for the popup
popup_html = f"<b>District Number: {district} - {district_name}</b><br>Total Violent Crimes Arrests: {total_violent_crimes_arrests}<br>{hourly_crimes_html}"
# Get the coordinates for the district
location = district_data[['st_latitude', 'st_longitude']].iloc[0].tolist()
# Create the folium marker and add it to the map
folium.Marker(
location=location,
popup=folium.Popup(html=popup_html, max_width=450)
).add_to(violent_crimes_arrests_per_district_map)
# Add layer control to toggle the choropleth
folium.LayerControl().add_to(violent_crimes_arrests_per_district_map)
# Display the finished map
violent_crimes_arrests_per_district_map
Requirement #5: (15 points)¶
- Create Marker Clusters on Choropleth map for those gun related crimes that have Location Description as RESIDENCE in (green icon) and those that have Location Description as STREET in (red icon) (See Query 5.)
- The legend should also reflect gun related crimes for STREET and RESIDENCE only.
- Make sure to place the markers at the crime location.
# Run the SQL Query to extract gun_crimes that took place only in 'street' or 'residence'
cursor.execute("""SELECT district,
caseno, \
location_description, \
block, \
description, \
latitude, \
longitude \
FROM crimes \
WHERE location_description IN ('RESIDENCE', 'STREET') \
AND description::text ILIKE '%gun%' \
ORDER BY district""")
rows = cursor.fetchall()
# Create the dataframe
street_resi_gun_crimes = pd.DataFrame(rows, columns=['dist_num', 'caseno', 'location_description', 'block', 'description', 'latitude', 'longitude'])
street_resi_gun_crimes.head(25)
| dist_num | caseno | location_description | block | description | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | JC105034 | STREET | 004XX W VAN BUREN ST | AGGRAVATED: HANDGUN | 41.876794 | -87.637817 |
| 1 | 2 | JC111769 | STREET | 050XX S DREXEL BLVD | AGGRAVATED: HANDGUN | 41.803105 | -87.604230 |
| 2 | 2 | JC109462 | STREET | 006XX E 41ST ST | AGGRAVATED: HANDGUN | 41.820329 | -87.610502 |
| 3 | 2 | JC100239 | STREET | 051XX S DR MARTIN LUTHER KING JR DR | AGGRAVATED: HANDGUN | 41.801635 | -87.616148 |
| 4 | 2 | JC113911 | STREET | 039XX S COTTAGE GROVE AVE | AGGRAVATED: HANDGUN | 41.823756 | -87.606868 |
| 5 | 2 | JC120171 | STREET | 002XX E 43RD ST | ARMED: HANDGUN | 41.816573 | -87.620758 |
| 6 | 2 | JC112133 | STREET | 053XX S WABASH AVE | UNLAWFUL USE HANDGUN | 41.797332 | -87.624158 |
| 7 | 2 | JC112322 | STREET | 047XX S ELLIS AVE | UNLAWFUL POSS OF HANDGUN | 41.808819 | -87.601685 |
| 8 | 2 | JC104646 | STREET | 043XX S FEDERAL ST | AGGRAVATED: HANDGUN | 41.815905 | -87.628032 |
| 9 | 2 | JC117260 | STREET | 050XX S STATE ST | UNLAWFUL POSS OF HANDGUN | 41.802826 | -87.625913 |
| 10 | 2 | JC125025 | STREET | 054XX S DREXEL AVE | ARMED: HANDGUN | 41.797159 | -87.603872 |
| 11 | 2 | JC116587 | STREET | 056XX S WENTWORTH AVE | ARMED: HANDGUN | 41.791613 | -87.630422 |
| 12 | 2 | JC104830 | STREET | 005XX E 47TH ST | AGGRAVATED: HANDGUN | 41.809433 | -87.613321 |
| 13 | 2 | JC117886 | STREET | 050XX S ST LAWRENCE AVE | AGGRAVATED: HANDGUN | 41.803482 | -87.611315 |
| 14 | 2 | JC118457 | STREET | 053XX S CORNELL AVE | ARMED: HANDGUN | 41.799134 | -87.585508 |
| 15 | 2 | JC122052 | STREET | 037XX S LAKE SHORE DR NB | AGGRAVATED: HANDGUN | 41.828738 | -87.603561 |
| 16 | 2 | JC124961 | RESIDENCE | 044XX S GREENWOOD AVE | ARMED: HANDGUN | 41.814133 | -87.599358 |
| 17 | 3 | JC121676 | STREET | 071XX S JEFFERY BLVD | UNLAWFUL POSS OF HANDGUN | 41.764929 | -87.576273 |
| 18 | 3 | JC126309 | STREET | 019XX E 72ND ST | ARMED: HANDGUN | 41.764321 | -87.577589 |
| 19 | 3 | JC100317 | STREET | 074XX S STATE ST | AGGRAVATED: HANDGUN | 41.758433 | -87.624766 |
| 20 | 3 | JC124958 | RESIDENCE | 005XX E 74TH ST | AGGRAVATED: HANDGUN | 41.760285 | -87.612327 |
| 21 | 3 | JC100002 | STREET | 061XX S EBERHART AVE | UNLAWFUL POSS OF HANDGUN | 41.783120 | -87.613267 |
| 22 | 3 | JC122679 | STREET | 067XX S EBERHART AVE | GUN OFFENDER: ANNUAL REGISTRATION | 41.772556 | -87.613019 |
| 23 | 3 | JC115977 | STREET | 017XX E 68TH ST | AGGRAVATED: HANDGUN | 41.771520 | -87.583204 |
| 24 | 3 | JC119755 | STREET | 072XX S PHILLIPS AVE | ARMED: HANDGUN | 41.763991 | -87.565562 |
import folium
from folium.plugins import MarkerCluster
# Assuming 'street_resi_gun_crimes' is your DataFrame and 'Boundaries.geojson' is the path to your geojson file.
# Create a map centered on downtown Chicago
gun_crime_arrests_map = folium.Map(location=[41.8781, -87.6298], zoom_start=11)
# Calculate the number of gun crimes per district
gun_crimes_per_district = street_resi_gun_crimes.groupby('dist_num').size().reset_index(name='gun_crimes')
# Add the choropleth layer for gun crimes per district
folium.Choropleth(
geo_data="Boundaries.geojson",
name="choropleth",
data=gun_crimes_per_district,
columns=['dist_num', 'gun_crimes'],
key_on='feature.properties.dist_num', # Adjust this as necessary to match your GeoJSON properties
fill_color='YlOrRd', # Adjust color as needed
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Gun-Related Crimes per District"
).add_to(gun_crime_arrests_map)
# Initialize the MarkerCluster object
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
# Loop through the DataFrame to create markers for each gun crime
for _, row in street_resi_gun_crimes.iterrows():
location = (row['latitude'], row['longitude'])
loc_desc = row['location_description'] # Get the location description
popup_text = f"Case No: {row['caseno']}<br>Location: {loc_desc}<br>Block: {row['block']}<br>Description: {row['description']}"
icon_color = 'green' if loc_desc == 'RESIDENCE' else 'red'
folium.Marker(
location=location,
popup=folium.Popup(popup_text, max_width=300),
icon=folium.Icon(color=icon_color)
).add_to(marker_cluster)
# Display the map
gun_crime_arrests_map
Requirement #6: (15 points)¶
- Create a seaborn scatterplot to show
THEFT,BATTERY,ASSAULT,ROBBERYby the hour of the day. (Reference: https://seaborn.pydata.org/generated/seaborn.scatterplot.html) - You can use SQL and/or Python to prepare the data for the plot. If you choose to extract the hour from date_of_occurrence using SQL, check out https://www.postgresql.org/docs/current/functions-datetime.html.
- Number of crimes will be on the y-axis and hour of day will be on the x-axis. Be sure to provide a title and a legend for your plot.
- Using a markdown cell, explain the findings from the plot.
# Run the SQL Query to extract number of `THEFT`, `BATTERY`, `ASSAULT`, `ROBBERY` by the hour of the day
cursor.execute("""SELECT date_part('hour', date_of_occurrence) AS hour, \
SUM(CASE WHEN primary_type = 'THEFT' THEN 1 ELSE 0 END) AS theft_count, \
SUM(CASE WHEN primary_type = 'BATTERY' THEN 1 ELSE 0 END) AS battery_count, \
SUM(CASE WHEN primary_type = 'ASSAULT' THEN 1 ELSE 0 END) AS assault_count, \
SUM(CASE WHEN primary_type = 'ROBBERY' THEN 1 ELSE 0 END) AS robbery_count \
FROM crimes \
WHERE primary_type IN ('THEFT', 'BATTERY', 'ASSAULT', 'ROBBERY') \
GROUP BY hour \
ORDER BY hour;""")
rows = cursor.fetchall()
# Create the dataframe
df = pd.DataFrame(rows, columns=['hour', 'theft_count', 'battery_count', 'assault_count', 'robbery_count'])
# Display the first 25 rows of the dataframe
df.head()
| hour | theft_count | battery_count | assault_count | robbery_count | |
|---|---|---|---|---|---|
| 0 | 0.0 | 116 | 130 | 23 | 22 |
| 1 | 1.0 | 63 | 128 | 30 | 21 |
| 2 | 2.0 | 53 | 119 | 32 | 24 |
| 3 | 3.0 | 52 | 100 | 18 | 14 |
| 4 | 4.0 | 47 | 66 | 18 | 17 |
# Set the aesthetic style of the plots with a dark background
sns.set(style="ticks", context="talk")
# Create the scatterplot with a larger figure size for better readability
plt.figure(figsize=(12, 8))
# Plot each crime type with different markers and sizes for better visibility
sns.scatterplot(data=df, x='hour', y='theft_count', label='Theft', color='navy', s=100, marker='o', edgecolor='w')
sns.scatterplot(data=df, x='hour', y='battery_count', label='Battery', color='peru', s=100, marker='s', edgecolor='w')
sns.scatterplot(data=df, x='hour', y='assault_count', label='Assault', color='green', s=100, marker='^', edgecolor='w')
sns.scatterplot(data=df, x='hour', y='robbery_count', label='Robbery', color='darkred', s=100, marker='X', edgecolor='w')
# Customize the axes and title with a larger font size
plt.title('Crime Counts by Hour of the Day', fontsize=20)
plt.xlabel('Hour of the Day', fontsize=16)
plt.ylabel('Number of Crimes', fontsize=16)
# Customize legend
plt.legend(title='Crime Types', title_fontsize='13', fontsize='12')
# Highlight the x-axis hours for better readability
plt.xticks(range(0, 24, 1))
# Add gridlines for better alignment
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
# Show the plot with a tight layout to ensure good spacing
plt.tight_layout()
plt.show()
from nbconvert import HTMLExporter
import nbformat
def save_notebook_as_html(notebook_path, output_path):
# Read the notebook
notebook_node = nbformat.read(notebook_path, as_version=4)
html_exporter = HTMLExporter()
# Convert to HTML
body, resources = html_exporter.from_notebook_node(notebook_node)
with open(output_path, 'w') as f:
f.write(body)
notebook_path = 'Kumar_Assignment_4.ipynb'
output_path = 'Kumar_Assignment_4.html'
save_notebook_as_html(notebook_path, output_path)